Skip to content

Lower bandwidth used for topology refresh #5618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 13 commits into
base: develop
Choose a base branch
from

Conversation

jmwample
Copy link
Contributor

@jmwample jmwample commented Mar 13, 2025

Prevents re-pulling node descriptors that are already known while using the TopologyRefresher

What does this do

  • Adds an endpoint to the Nym API that allows pulling batches of node descriptors by node_id
  • Creates a TopologyProvider that caches topologies and only goes to network if the known topology is older than the configured cache ttl.
  • When refreshing the Topology we compare the node IDs in the cache to the node IDs in the refreshed layer assignments.
    • If the epoch has not changed no descriptors are pulled
    • if the epoch has changed layer assignments are pulled and updated in the cached topology
    • Only descriptors for node IDs in the new layer assignments that are NOT in the known topology cache are pulled.
  • Adds a Performance field to the stored RoutingNode in the Cached topology. This way the cache contains all node descriptors and queries simply filter based on the stored performance, as opposed to filtering before storing and downloading more descriptors.

What does this not do

  • Each created MixnetClient still uses an independent TopologyProvider and cached topology. This PR does not create a shared topology provider that can be used for every new connection.
  • The first refresh pulls the full topology. This PR does not pre-cache or add any topology serialization deserialization that would prevent the initial pull.
  • This PR does not change the stage at which the initial topology is pulled. As is the TopologyRefresher is created as a MixnetCient is starting up. This means that the initial topology pull doesn't happen until we have already started the process of connecting to the mixnet.
  • This does not effect the Topology information used by the VPN client. That is a different API with different data and happens in the VPN client repo.

This change is Reviewable

@jmwample jmwample requested a review from octol as a code owner March 13, 2025 20:48
@jmwample jmwample added this to the Chuckles milestone Mar 13, 2025
@jmwample jmwample force-pushed the jmwample/topo-refresh branch from 53c2e7e to e7d88c3 Compare March 13, 2025 21:05
Copy link

vercel bot commented Mar 13, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
nym-explorer-v2 ✅ Ready (Inspect) Visit Preview 💬 Add feedback Mar 13, 2025 9:47pm
2 Skipped Deployments
Name Status Preview Comments Updated (UTC)
docs-nextra ⬜️ Ignored (Inspect) Visit Preview Mar 13, 2025 9:47pm
nym-next-explorer ⬜️ Ignored (Inspect) Visit Preview Mar 13, 2025 9:47pm

Copy link
Contributor

@octol octol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Would be good to have @jstuczyn to quickly skim it as well though

/// Topology Provider build around a cached piecewise provider that uses the Nym API to
/// fetch changes and node details.
#[derive(Clone)]
pub struct NymApiTopologyProvider {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about using something like NymApiSmartTopologyProvider? Having two types with the same name was a bit confusing initially when reading the the code

pub fn gateways(&self) -> impl Iterator<Item = &RoutingNode> {
self.node_details.values().filter(|n| {
self.rewarded_set.entry_gateways.contains(&n.node_id)
|| self.rewarded_set.exit_gateways.contains(&n.node_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to confirm with @jstuczyn that this is correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's not. but it's easy to miss because we've been going back and forth about this for quite a while. the current consensus is that you can use any node that supports gateway mode regardless if its in the active set or not

Copy link
Contributor

@jstuczyn jstuczyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few notes:

  • Only descriptors for node IDs in the new layer assignments that are NOT in the known topology cache are pulled.

sometimes you might have to pull information about nodes more often than that. regardless if they're known or not because they might have changed their internal configuration (like started using different mix port)

  • you have to be pulling all gateways, not just the ones in the rewarded set, as clients are allowed to use any of them at any time

pub fn gateways(&self) -> impl Iterator<Item = &RoutingNode> {
self.node_details.values().filter(|n| {
self.rewarded_set.entry_gateways.contains(&n.node_id)
|| self.rewarded_set.exit_gateways.contains(&n.node_id)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no, it's not. but it's easy to miss because we've been going back and forth about this for quite a while. the current consensus is that you can use any node that supports gateway mode regardless if its in the active set or not

@@ -231,6 +231,8 @@ impl PacketPreparer {
mixnet_entry: false,
mixnet_exit: false,
},
// We have no information about performance in legacy node formats
performance: Percent::hundred(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this point I guess we should probably use performance of zero if nodes are still outdated...

pub(super) async fn nodes_basic_batch(
state: State<AppState>,
Query(query_params): Query<NodesParamsWithRole>,
Json(ids): Json<Vec<u32>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there should be some limit on the number of nodes you can request (because certain people might complain it could lead to DoS)

@octol octol modified the milestones: Chuckles, Tex Mar 20, 2025
@benedettadavico benedettadavico changed the base branch from develop to release/2025.7-tex April 4, 2025 10:59
@benedettadavico benedettadavico modified the milestones: Tex, Tourist Apr 8, 2025
@benedettadavico benedettadavico changed the base branch from release/2025.7-tex to develop April 8, 2025 07:52
@benedettadavico benedettadavico modified the milestones: Tourist, Godiva Apr 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants